An Integrated Architecture for Exploring, Wrapping, Mediating and Restructuring Information from the Web

نویسنده

  • Wolfgang May
چکیده

The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources. A main problem with current wrapper/mediator approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an “impedance mismatch” between the wrapper and mediator level. Additionally, most approaches currently are tailored to access information from a fixed set of sources. In this paper, we discuss an architecture where Web exploration, wrapping, mediation, and querying is done in an integrated system. Such an architecture reveals significant advantages in combination with a unified framework – i.e., data model and language – in which all tasks are done. Our approach is based on a unified model of the applicationlevel information and the relevant fragment of the Web, and on an integrated language for accessing the Web, wrapping, mediating, and querying information. In this world model, in contrast to other approaches, the relevant part of the Web becomes a part of the internal world model of the system. This allows for a data-driven Web exploration which is independent from a given network of individual predefined wrappers and mediators. Thus, in addition to the classical wrapping and mediating functionality, a system in this architecture can be equipped with Web navigation and exploration functionality. In an abstract sense, the system comprises a universal wrapper which can be applied to arbitrary Web data sources which become known to the system during information processing. Equipped with suitably intelligent rules, the system can potentially explore before unknown parts of the Web, thus coping with the steady growth of the Web. The architecture is implemented in the FLORID system [17].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Extraction from the Web

The goal of information extraction from the Web is to provide an integrated view on data from autonomous heterogeneous information sources The main problem with current wrap per mediator approaches is that they rely on very di erent formalisms and tools for wrappers and mediators thus leading to an impedance mismatch between the wrapper and mediator level Additionally most approaches nowadays a...

متن کامل

ایجاد نیمه خودکار مشاپ های سازمانی با استفاده از توصیفات معنایی

Mashups are next generation of web applications. A mashup is a lightweight web application that is created by combining information or capabilities from more than one existing resources to deliver a new and integrated experience to the user. Mashups introduce a new class of integration techniques in enterprises for implementing situational applications (i.e. applications that come together to s...

متن کامل

Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels

The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...

متن کامل

Providing an Enterprise Architecture Framework Model for Laboratory Information Management Systems by Service Oriented Approach

Background and Aim: Laboratories are one of the most important scientific and research centers. Laboratory information management systems provide a platform for recording the information and collaborating between researchers. The main purpose of this study was suggesting an organizational architecture model of laboratory information management systems.  Materials and Methods: This study was a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000